Skip to content

Add support for git commit verification#3905

Open
omehegan wants to merge 6 commits into
feat/git-checkout-featuresfrom
owen/SUP-6535
Open

Add support for git commit verification#3905
omehegan wants to merge 6 commits into
feat/git-checkout-featuresfrom
owen/SUP-6535

Conversation

@omehegan
Copy link
Copy Markdown
Contributor

@omehegan omehegan commented May 7, 2026

Description

Adds commit-on-branch verification to the checkout phase.

As part of our agent checkout improvement work, we are adding support for git commit verification. This addresses a potential security issue: if an attacker adds a malicious commit to the repo, and is then able to trigger a build which specifies commit: 3v1l5ha123 branch: main we would build that commit as if it were on the main branch, without verifying that this is the case, which could potentially lead to a production deployment of malicious code.

This change introduces a BUILDKITE_GIT_COMMIT_VERIFICATION environment variable, which defaults to false. If set to true, when we are given a commit and branch, before checkout and build, we do the following:

  1. Use git merge-base --is-ancestor to check if the commit really belongs to the specified branch.
  2. If it does, proceed; if it definitely does not, fail and exit.
  3. In ambiguous situations, it's possible that we only have a shallow clone of the repo. In that case we:
  4. Deepen the clone by 50 commits, with the assumption that this should be enough to find the specified SHA.
  5. If we still get an ambiguous result, to a full unshallow and check again.
  6. Proceed if the commit matches, fail and exit if it does not.

In terms of alternatives, the only change I considered was just going straight to unshallow before verifying any commit. But I didn't want users to have to completely give up shallow commits to get verification, so I went with the "deepen first, then unshallow if necessary" approach.

Context

See SUP-6535.

Changes

Adds a BUILDKITE_GIT_COMMIT_VERIFICATION env var to enable to functionality. When set, we perform various git operations in between fetch and checkout, to make sure the provided commit is valid on the provided branch.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go tool gofumpt -extra -w .)

Disclosures / Credits

I used Claude to help me plan the feature and teach me some Go fundamentals. I coded the core functionality myself, and let Claude write the tests.

@omehegan omehegan requested review from a team as code owners May 7, 2026 06:36
@omehegan omehegan marked this pull request as draft May 7, 2026 06:36
@omehegan omehegan changed the base branch from main to feat/git-checkout-features May 8, 2026 03:37
@omehegan omehegan marked this pull request as ready for review May 14, 2026 04:32
Copy link
Copy Markdown
Contributor

@zhming0 zhming0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in this change, but I left some questions re overall directions and the cost / gain asymmetry. Also I recommend relocating some codes to another file.

Comment thread internal/job/checkout.go Outdated
Comment on lines +883 to +896
if e.Tag != "" {
return nil
}

// Skip if this is a PR build — the commit may be on a merge ref, not the target branch
if e.PullRequest != "" {
return nil
}

// Skip if a custom refspec is set — the fetch may not populate standard branch refs,
// making ancestry verification unreliable
if e.RefSpec != "" {
return nil
}
Copy link
Copy Markdown
Contributor

@zhming0 zhming0 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I totally see the value of this change adding a level of protection, I worry that it's a bit narrow given all these bypass conditions.

Also on the direction, it's ultimately saying: "under X case, do not trust Buildkite backend". It's a questionable direction for us to invest in though we have prior arts (I think). But I do wonder if we could/should strengthen the backend to not send false information.

It's not strictly a blocker, but something worth a number of words to explain about.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"under X case, do not trust Buildkite backend"

There's plenty of prior art for client-side protections: pre-bootstrap hook for job validation, Signed Pipelines, allowed-env-vars flag, allowed-plugins flag, etc.

In the original Buildkite model, where the backend doesn't have access to the code, the backend can't determine whether or not a particular commit is on a particular branch. I'm not sure how we would strengthen the backend to send only commits on the intended branch without requiring code access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Josh about the client-side protections. There's no way for us to verify that a commit is or isn't on the claimed branch without having a checkout of the repo. Theoretically there are ways that we could do this with the GitHub API before triggering a build, but such a solution has potential access issues, race conditions, and rate limit implications.

In terms of the bypass conditions you mentioned, those are all scenarios in which commit verification isn't meaningful. For example:

HEAD commit - we're not building a specific commit, we're building whatever is at HEAD, we're not being told to build something specific.
Empty branch - if we're not given a branch name, then there are no branch-based conditions to exploit.
PR builds - If it's a PR build, there's no risk of pretending that it's on main.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I don't see another way out either for this particular defense so I have no objection. My main point is mainly around holistically if this prevention mechanism is meaningful enough when an attacker gained BK API access.

Could the attacker specify refspec (which bypass the protection) + branch at the same time? I didn't trace that code path so a bit unsure.

So not a blocker, mainly a question for my understanding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I got you - so what I would say is that there are two attack vectors here. The first is webhook manipulation - someone creates a branch and commits malicious code to it, then spoofs a webhook that includes that commit SHA, with branch: main. Refspec isn't part of this payload, so that concern isn't valid here. Without verification, this is enough for us to build a malicious commit with main permissions.

The second vector is what you're thinking about, BK API access. As I mentioned in my other comment, if someone compromises an API token, we have bigger things to worry about. They can do more damage than what verification is going to prevent. You're right that with that access they could bypass verification by sending a bogus refspec. But we have to skip verification in the custom refspec case, because we have to assume that main resolves to something meaningful locally, in order to check if a provided commit SHA legitimately exists on it or not. A custom refspec breaks that assumption, so we have to skip attempting verification or else risk false failures.

Let me know if all that makes sense!

Comment thread internal/job/checkout.go
Comment thread internal/job/checkout.go Outdated
Comment on lines +844 to +846
// Still 128 - full unshallow as last resort
e.shell.Commentf("Deepening insufficient, performing a full unshallow...")
_ = e.shell.Command("git", "fetch", "--unshallow").Run(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the cost outweigh the gain here? Now the equation become, if someone put a random commit SHA in the build api, our agent will basically do a repo clone -> which depending on situation can take a long time.

An realistically, building for a old commit in CI is pretty rare 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're suggesting - do you think we should never go through the unshallow step, because it will make builds slower if an attacker sends a random commit via the build API? If this person already has API access, we have bigger things to worry about. The more likely attack scenario is someone putting a legitimate malicious commit on a branch and tricking us into building it as though it was on main (i.e. the pipeline runs with main branch permissions, could deploy, etc.).

You're thinking about the legitimate scenario of building an "old" commit where the "deepen by 50 commits" might not be enough. But in a busy monorepo, a rebuild from even a day ago could be more than 50 commits behind, so we'd have to unshallow to do the verification. A user is going to expect that. I'm just trying to avoid scenarios where we block legitimate builds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah soz I should be clearer, I think there is pretty big jump between check 50 commits to a full repo scan.

I wonder if it's an legitimate commit in a busy repo, how likely will we trigger the full repo scan.

This isn't a blocker, just something I am trying to understand.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Yeah it's a tradeoff, and I did think about what the right balance was. First, this only matters for people using shallow clones, which AFAIK is a small percentage of current users. Second, if you do shallow clones on a repo that isn't very busy, deepen 50 is probably going to be enough - especially for people who are doing something reasonable like depth 10 rather than depth 1. It's only very busy branches where, if I understand your concern, deepen 50 might not be enough, but a full unshallow would be expensive. In that scenario you might wish for an intermediate deepen 250 step or something, before the full unshallow. Is that what you have in mind? If so, I think what I'd suggest is that we stick with the current solution and see how it goes. If this starts to become a problem, a potential solution would be to make the deepen amount configurable, instead of hard-coded to 50 commits. So for a busy repo, if this keeps coming up, they could change it to a value that suits them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 happy to give this a shot.

Copy link
Copy Markdown
Contributor

zhming0 commented May 22, 2026

Another thing we raised in the Agent Office hour is the big feature branch approach. We recommend targeting the main branch: 1. to derisk the big bang deployment. 2. V4 is happening in parallel, we worry that you would hit non-trivial conflicts if we hold a long running feature branch. Oz and Ben might be able to share a bit more context, they were on the meeting.

What do you think?

@omehegan
Copy link
Copy Markdown
Contributor Author

@zhming0 yeah I'm catching up on our internal thread about that topic. I was just following the example we were already using for this work, but since it sounds like folks would rather have us merge to main in discrete chunks, I'm fine with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants